Attribute Information:

The inputs are as follows

X1=the transaction date (for example, 2013.250=2013 March, 2013.500=2013 June, etc.)

X2=the house age (unit: year)

X3=the distance to the nearest MRT station (unit: meter)

X4=the number of convenience stores in the living circle on foot (integer)

X5=the geographic coordinate, latitude. (unit: degree)

X6=the geographic coordinate, longitude. (unit: degree)

The output is as follow

Y= house price of unit area (10000 New Taiwan Dollar/Ping, where Ping is a local unit, 1 Ping = 3.3 meter squared)

Algunas librerias a utilizar:

library(caret)
library(tidyverse)

cargamos los datos exculyendo \(X4 \quad X5\) que son coordenadas, obtenemos resumenes de las variables y tambien se normalizan los datos para modelar:

setwd('C:/Users/Oliver/Documents/9/TAE/reto 3 superficie de respuesta knn')
datos_completos <- readxl::read_xlsx('Real estate valuation data set (1).xlsx',
                           col_names = c('n', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'y'), skip = 1)
datos <- datos_completos %>% 
              select('x1', 'x2', 'x3', 'x4', 'y')

# un análisis rapido a lo que son los datos:
dim(datos)
## [1] 414   5
names(datos)
## [1] "x1" "x2" "x3" "x4" "y"
head(datos)
tail(datos)
summary(datos)
##        x1             x2               x3                x4        
##  Min.   :2013   Min.   : 0.000   Min.   :  23.38   Min.   : 0.000  
##  1st Qu.:2013   1st Qu.: 9.025   1st Qu.: 289.32   1st Qu.: 1.000  
##  Median :2013   Median :16.100   Median : 492.23   Median : 4.000  
##  Mean   :2013   Mean   :17.713   Mean   :1083.89   Mean   : 4.094  
##  3rd Qu.:2013   3rd Qu.:28.150   3rd Qu.:1454.28   3rd Qu.: 6.000  
##  Max.   :2014   Max.   :43.800   Max.   :6488.02   Max.   :10.000  
##        y         
##  Min.   :  7.60  
##  1st Qu.: 27.70  
##  Median : 38.45  
##  Mean   : 37.98  
##  3rd Qu.: 46.60  
##  Max.   :117.50
str(datos_completos)
## tibble [414 x 8] (S3: tbl_df/tbl/data.frame)
##  $ n : num [1:414] 1 2 3 4 5 6 7 8 9 10 ...
##  $ x1: num [1:414] 2013 2013 2014 2014 2013 ...
##  $ x2: num [1:414] 32 19.5 13.3 13.3 5 7.1 34.5 20.3 31.7 17.9 ...
##  $ x3: num [1:414] 84.9 306.6 562 562 390.6 ...
##  $ x4: num [1:414] 10 9 5 5 5 3 7 6 1 3 ...
##  $ x5: num [1:414] 25 25 25 25 25 ...
##  $ x6: num [1:414] 122 122 122 122 122 ...
##  $ y : num [1:414] 37.9 42.2 47.3 54.8 43.1 32.1 40.3 46.7 18.8 22.1 ...
table(datos_completos$x4)
## 
##  0  1  2  3  4  5  6  7  8  9 10 
## 67 46 24 46 31 67 37 31 30 25 10
# normalizando los datos para evitar confusiones por la diferencia de las escalas al modelar
# y más presiscion en la variabildad del error de validacion. Extraigo media desviacion:
datoc <- scale(datos[,c("x1", "x2", "x3", "x4", "y" )], center = T, scale = T)
centro<-attr(datoc,"center")
escala<-attr(datoc,"scale")
datos<-as.data.frame(datoc)

Buscando el k optimo:

Para esto se seleccionan todas las variables, se utiliza validacion cruzada repetida 3 veces con 10 subconjuntos, y se obtiene primero el k optimo y luego el resumen de criterios de seleccion:

# Se usará CV repetido 3 veces con k-folds=10:
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 3)

# seleccion de k optimo con 4 variables usanco 3 CV con k=10, y con k vecinos de 1-30:
knn_fit <- train(y ~., data = datos, method = "knn",
                  trControl=trctrl,
                  preProcess = c( "knnImpute"),
                 tuneGrid   = expand.grid(k = 1:30))

#Su resultado en los diferentes criterios de seleccion y el k vecimo con mejor resultado:
knn_fit$bestTune
knn_fit
## k-Nearest Neighbors 
## 
## 414 samples
##   4 predictor
## 
## Pre-processing: nearest neighbor imputation (4), centered (4), scaled (4) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 372, 373, 374, 373, 371, 371, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    1  0.8006427  0.4882675  0.5071323
##    2  0.7173713  0.5387820  0.4757051
##    3  0.6691660  0.5829475  0.4566476
##    4  0.6388517  0.6074202  0.4421585
##    5  0.6335631  0.6093243  0.4461503
##    6  0.6337376  0.6066427  0.4459074
##    7  0.6277840  0.6133057  0.4434905
##    8  0.6254149  0.6158372  0.4423897
##    9  0.6252515  0.6151415  0.4445268
##   10  0.6277598  0.6114284  0.4453491
##   11  0.6280838  0.6107833  0.4456565
##   12  0.6272345  0.6122427  0.4444843
##   13  0.6290459  0.6101078  0.4457232
##   14  0.6299378  0.6089757  0.4458351
##   15  0.6300769  0.6088571  0.4465984
##   16  0.6286310  0.6108917  0.4465676
##   17  0.6292418  0.6101910  0.4476461
##   18  0.6299073  0.6101140  0.4489681
##   19  0.6283603  0.6132352  0.4475913
##   20  0.6276996  0.6145322  0.4477876
##   21  0.6278042  0.6149987  0.4471886
##   22  0.6287392  0.6138429  0.4474194
##   23  0.6286917  0.6143171  0.4476490
##   24  0.6289599  0.6145361  0.4481139
##   25  0.6303316  0.6132867  0.4494743
##   26  0.6319519  0.6115863  0.4504319
##   27  0.6342660  0.6093138  0.4519364
##   28  0.6354956  0.6078981  0.4529432
##   29  0.6359885  0.6080544  0.4527112
##   30  0.6382790  0.6056096  0.4553879
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 9.

Selección del número de variables óptimas:

Para este caso, simplemente se utilizara la matris de varianzas covarianzas:

# RemoveRedundant Features
correlationMatrix <- cor(datos[,1:4])
# find attributes that are highly corrected (ideally >0.75)
highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.6)
# print indexes of highly correlated attributes
print(highlyCorrelated)
## [1] 3

La caracteristica x3 podria considerarse descartarla, por mostrar correlacion -0.602519145 con x4, pero no supera limite que seria 0.75%, para descartar.

Seleccione el mejor modelo de dos variables y grafique la superficie de respuesta:

Se realizaron 6 modelos con las conbinaciones de variables se obtendran la lista de medidas de error para seleccion de mejor modelo, el mejor k y la grafica que muestra los k vs MSE: :

Este modelo contiene \(x1 \quad x2\) que es la fecha y la edad de la vivienda,

knn_fit1 <- train(y ~., data = datos[,c(1,2,5)], method = "knn",
                 trControl=trctrl,
                 preProcess = c( "knnImpute"),
                 tuneGrid   = expand.grid(k = 1:30))

knn_fit1
## k-Nearest Neighbors 
## 
## 414 samples
##   2 predictor
## 
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 374, 372, 372, 373, 372, 373, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared    MAE      
##    1  1.2014363  0.06386558  0.9281103
##    2  1.0998786  0.07568688  0.8777499
##    3  1.0332487  0.09571103  0.8292164
##    4  1.0012218  0.10304622  0.8092729
##    5  0.9721354  0.12248939  0.7927042
##    6  0.9521052  0.13501577  0.7812509
##    7  0.9269530  0.16427617  0.7578738
##    8  0.9212295  0.16724910  0.7523449
##    9  0.9203969  0.16553200  0.7519513
##   10  0.9174246  0.16803883  0.7485100
##   11  0.9191615  0.16289694  0.7499328
##   12  0.9217200  0.15759610  0.7496515
##   13  0.9191882  0.16214017  0.7435085
##   14  0.9206462  0.15785317  0.7430251
##   15  0.9184748  0.16186967  0.7407515
##   16  0.9210511  0.15734498  0.7431701
##   17  0.9209989  0.15724939  0.7442800
##   18  0.9190017  0.16137771  0.7440804
##   19  0.9195514  0.16004081  0.7446788
##   20  0.9187419  0.16102346  0.7437728
##   21  0.9163529  0.16446204  0.7406460
##   22  0.9147314  0.16651466  0.7374160
##   23  0.9149852  0.16645358  0.7350820
##   24  0.9151201  0.16562691  0.7349973
##   25  0.9152559  0.16429431  0.7346025
##   26  0.9149201  0.16529079  0.7335555
##   27  0.9159015  0.16335138  0.7349590
##   28  0.9149876  0.16443964  0.7339200
##   29  0.9144716  0.16518572  0.7340239
##   30  0.9133628  0.16743740  0.7345480
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 30.
knn_fit1$bestTune
plot(knn_fit1)

Este modelo contiene \(x1 \quad x3\) son la fecha y la distancia a la estacion mas cercana:

knn_fit2 <- train(y ~., data = datos[,c(1,3,5)], method = "knn",
                  trControl=trctrl,
                  preProcess = c("knnImpute"),
                  tuneGrid   = expand.grid(k = 1:30))


knn_fit2
## k-Nearest Neighbors 
## 
## 414 samples
##   2 predictor
## 
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 373, 372, 372, 373, 373, 372, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    1  0.8146788  0.4772282  0.5687585
##    2  0.7282263  0.5097028  0.5380944
##    3  0.7073080  0.5263099  0.5086082
##    4  0.6941858  0.5411176  0.4977764
##    5  0.6740585  0.5573220  0.4850428
##    6  0.6634569  0.5660128  0.4809047
##    7  0.6551975  0.5749496  0.4747094
##    8  0.6502581  0.5804701  0.4693092
##    9  0.6466050  0.5854894  0.4656072
##   10  0.6458026  0.5866665  0.4681836
##   11  0.6500215  0.5813941  0.4708590
##   12  0.6522548  0.5783847  0.4724740
##   13  0.6522494  0.5787401  0.4720805
##   14  0.6552570  0.5747583  0.4728879
##   15  0.6566907  0.5727105  0.4740091
##   16  0.6573500  0.5714565  0.4740474
##   17  0.6584269  0.5698987  0.4747750
##   18  0.6568674  0.5722389  0.4734429
##   19  0.6583392  0.5701333  0.4739853
##   20  0.6580490  0.5707972  0.4736735
##   21  0.6598254  0.5684459  0.4757288
##   22  0.6603250  0.5681418  0.4762749
##   23  0.6606551  0.5685454  0.4762349
##   24  0.6608604  0.5693304  0.4761891
##   25  0.6595726  0.5715128  0.4757649
##   26  0.6592928  0.5724495  0.4763635
##   27  0.6602296  0.5719067  0.4770055
##   28  0.6603806  0.5723777  0.4782354
##   29  0.6607952  0.5727561  0.4787423
##   30  0.6627074  0.5703096  0.4807798
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 10.
knn_fit2$bestTune
plot(knn_fit2)

Este modelo contiene \(x1 \quad x4\) quienes son la fecha de transaccion y el numero de tienda de coveniencia:

knn_fit3 <- train(y ~., data = datos[,c(1,4,5)], method = "knn",
                  trControl=trctrl,
                  preProcess = c("knnImpute"),
                  tuneGrid   = expand.grid(k = 1:30))


knn_fit3
## k-Nearest Neighbors 
## 
## 414 samples
##   2 predictor
## 
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 373, 371, 373, 373, 371, 373, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    1  0.9603181  0.2149786  0.7131368
##    2  0.9284028  0.2299306  0.6941330
##    3  0.8617405  0.2909020  0.6537014
##    4  0.8406814  0.3169460  0.6330827
##    5  0.8260652  0.3352410  0.6183191
##    6  0.8134487  0.3523661  0.6093366
##    7  0.8120919  0.3530736  0.6105054
##    8  0.8108423  0.3537278  0.6095059
##    9  0.8093801  0.3542346  0.6103979
##   10  0.8075009  0.3562714  0.6074556
##   11  0.8040508  0.3611332  0.6044225
##   12  0.8015538  0.3652740  0.6010854
##   13  0.8014128  0.3662174  0.6006616
##   14  0.8019617  0.3654012  0.6018455
##   15  0.8024146  0.3632519  0.6033059
##   16  0.8021896  0.3638336  0.6029058
##   17  0.8019623  0.3638650  0.6022095
##   18  0.8011548  0.3650431  0.6008924
##   19  0.8001597  0.3659560  0.5999592
##   20  0.7998914  0.3661064  0.6001544
##   21  0.7981622  0.3692635  0.6005220
##   22  0.7966635  0.3717861  0.5999952
##   23  0.7951226  0.3741693  0.5998670
##   24  0.7935436  0.3766966  0.5991525
##   25  0.7936396  0.3764988  0.5992040
##   26  0.7934983  0.3774351  0.5992142
##   27  0.7938036  0.3771365  0.5984205
##   28  0.7938205  0.3775952  0.5978549
##   29  0.7953468  0.3757446  0.5977999
##   30  0.7950621  0.3762792  0.5973261
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 26.
knn_fit3$bestTune
plot(knn_fit3)

Este modelo contiene \(x2 \quad x3\), son la edad de la vivienda y la distancia a la estacion mas cercana:

knn_fit4 <- train(y ~., data = datos[,c(2,3,5)], method = "knn",
                  trControl=trctrl,
                  preProcess = c("knnImpute"),
                  tuneGrid   = expand.grid(k = 1:30))


knn_fit4
## k-Nearest Neighbors 
## 
## 414 samples
##   2 predictor
## 
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 373, 373, 373, 372, 372, 372, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    1  0.6779796  0.6046783  0.4364273
##    2  0.6062891  0.6547939  0.4004069
##    3  0.5964975  0.6637222  0.3935846
##    4  0.5855952  0.6710949  0.3962406
##    5  0.5802261  0.6728412  0.3981383
##    6  0.5788448  0.6726032  0.3998233
##    7  0.5801730  0.6697664  0.4001744
##    8  0.5873903  0.6601935  0.4060002
##    9  0.5966296  0.6494412  0.4115472
##   10  0.6003080  0.6436738  0.4143625
##   11  0.6027932  0.6398162  0.4174723
##   12  0.6048500  0.6365005  0.4185478
##   13  0.6059422  0.6344401  0.4186361
##   14  0.6073554  0.6324337  0.4182949
##   15  0.6126021  0.6258508  0.4214043
##   16  0.6140236  0.6233158  0.4243967
##   17  0.6154903  0.6214856  0.4248947
##   18  0.6165370  0.6198429  0.4250147
##   19  0.6198005  0.6157395  0.4278172
##   20  0.6225504  0.6123909  0.4298867
##   21  0.6250641  0.6093549  0.4312658
##   22  0.6271722  0.6068576  0.4323722
##   23  0.6285783  0.6050569  0.4328658
##   24  0.6298062  0.6034751  0.4329338
##   25  0.6310454  0.6022888  0.4342398
##   26  0.6333726  0.5993215  0.4361513
##   27  0.6346202  0.5976564  0.4375531
##   28  0.6353271  0.5969520  0.4388689
##   29  0.6367628  0.5952756  0.4412017
##   30  0.6380712  0.5937609  0.4436360
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 6.
knn_fit4$bestTune
plot(knn_fit4)

Este modelo contiene \(x2 \quad x4\), son edad de vivienda y tiendas de conveniencia:

knn_fit5 <- train(y ~., data = datos[,c(2,4,5)], method = "knn",
                  trControl=trctrl,
                  preProcess = c("knnImpute"),
                  tuneGrid   = expand.grid(k = 1:30))


knn_fit5
## k-Nearest Neighbors 
## 
## 414 samples
##   2 predictor
## 
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 371, 373, 372, 374, 372, 372, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    1  0.7941620  0.4860517  0.5335523
##    2  0.7283709  0.5116053  0.5038691
##    3  0.7209468  0.5153381  0.4915121
##    4  0.7233108  0.5098283  0.4968245
##    5  0.7190457  0.5156791  0.4952390
##    6  0.7115733  0.5195778  0.4955930
##    7  0.7118602  0.5150910  0.4983763
##    8  0.7115441  0.5139492  0.5035127
##    9  0.7098944  0.5143357  0.5063993
##   10  0.7094687  0.5135527  0.5100065
##   11  0.7064946  0.5153350  0.5104105
##   12  0.7083284  0.5113761  0.5130074
##   13  0.7047724  0.5151308  0.5110381
##   14  0.7005023  0.5196524  0.5118062
##   15  0.6992040  0.5204712  0.5117025
##   16  0.6981346  0.5213023  0.5123603
##   17  0.6972081  0.5224729  0.5133877
##   18  0.6961799  0.5232387  0.5126466
##   19  0.6961694  0.5233630  0.5113874
##   20  0.6961142  0.5234840  0.5111524
##   21  0.6967618  0.5220646  0.5128291
##   22  0.6969474  0.5219449  0.5146789
##   23  0.6966658  0.5224028  0.5171892
##   24  0.6958560  0.5232891  0.5175484
##   25  0.6960273  0.5234671  0.5186474
##   26  0.6950116  0.5248487  0.5185443
##   27  0.6957288  0.5241595  0.5192062
##   28  0.6961582  0.5233361  0.5195045
##   29  0.6974086  0.5212987  0.5208973
##   30  0.6980221  0.5204005  0.5216840
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 26.
knn_fit5$bestTune[[1]]
## [1] 26
plot(knn_fit5)

Este modelo contiene \(x3 \quad x4\), son distancia a la estacion mas cercana y tiendas de conveniencia:

knn_fit6 <- train(y ~., data = datos[,c(3,4,5)], method = "knn",
                  trControl=trctrl,
                  preProcess = c("knnImpute"),
                  tuneGrid   = expand.grid(k = 1:30))


knn_fit6
## k-Nearest Neighbors 
## 
## 414 samples
##   2 predictor
## 
## Pre-processing: nearest neighbor imputation (2), centered (2), scaled (2) 
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 372, 371, 371, 373, 374, 374, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    1  0.6464270  0.6264692  0.4235647
##    2  0.6307216  0.6294148  0.4288128
##    3  0.6257927  0.6357139  0.4303352
##    4  0.6115260  0.6410582  0.4264299
##    5  0.6090733  0.6393778  0.4301734
##    6  0.6144900  0.6338717  0.4346380
##    7  0.6304694  0.6141804  0.4405055
##    8  0.6287945  0.6142301  0.4401579
##    9  0.6304924  0.6127300  0.4436554
##   10  0.6341234  0.6081175  0.4479324
##   11  0.6359197  0.6058804  0.4497993
##   12  0.6432751  0.5983619  0.4557168
##   13  0.6461530  0.5953084  0.4602024
##   14  0.6514271  0.5896653  0.4644482
##   15  0.6536482  0.5874107  0.4682819
##   16  0.6572694  0.5840934  0.4710225
##   17  0.6574170  0.5824101  0.4713523
##   18  0.6581613  0.5806927  0.4699214
##   19  0.6582972  0.5790017  0.4695254
##   20  0.6576748  0.5784395  0.4707329
##   21  0.6568011  0.5794962  0.4707687
##   22  0.6551140  0.5811893  0.4697476
##   23  0.6563809  0.5797105  0.4708306
##   24  0.6565428  0.5790174  0.4707390
##   25  0.6568729  0.5782917  0.4704014
##   26  0.6571568  0.5766005  0.4701272
##   27  0.6536598  0.5812845  0.4667442
##   28  0.6525734  0.5825621  0.4660138
##   29  0.6545386  0.5795937  0.4675761
##   30  0.6553753  0.5780943  0.4690605
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was k = 5.
knn_fit6$bestTune
plot(knn_fit6)

Con esto se puede resumir el codigo hecho, primero se observa el par de variables, luego sus criterios para seleccion de varaibles:

for (i in list(knn_fit1,knn_fit2,knn_fit3,knn_fit4,knn_fit5,knn_fit6)){
  print(i$coefnames)
  print(i$results %>% filter(k==i$bestTune[[1]]))
}
## [1] "x1" "x2"
##    k      RMSE  Rsquared      MAE    RMSESD RsquaredSD      MAESD
## 1 30 0.9133628 0.1674374 0.734548 0.1233471 0.08345744 0.07129405
## [1] "x1" "x3"
##    k      RMSE  Rsquared       MAE    RMSESD RsquaredSD     MAESD
## 1 10 0.6458026 0.5866665 0.4681836 0.1486156 0.09797457 0.0746464
## [1] "x1" "x4"
##    k      RMSE  Rsquared       MAE    RMSESD RsquaredSD      MAESD
## 1 26 0.7934983 0.3774351 0.5992142 0.1683307   0.120036 0.07136043
## [1] "x2" "x3"
##   k      RMSE  Rsquared       MAE    RMSESD RsquaredSD      MAESD
## 1 6 0.5788448 0.6726032 0.3998233 0.1490725  0.1154914 0.05382897
## [1] "x2" "x4"
##    k      RMSE  Rsquared       MAE    RMSESD RsquaredSD      MAESD
## 1 26 0.6950116 0.5248487 0.5185443 0.1893983  0.1432126 0.07771896
## [1] "x3" "x4"
##   k      RMSE  Rsquared       MAE   RMSESD RsquaredSD      MAESD
## 1 5 0.6090733 0.6393778 0.4301734 0.166965   0.131845 0.06491474

segun estos resultados, fijandonos primeramente en RMSE las variables que optimizan con menor resupesta son x2 y x3:

se grafica las superficie de respuesta

medias<-knn_fit4$preProcess$mean
dsv_es<-knn_fit4$preProcess$std

x2 <- seq(min(datos$x2), max(datos$x2), length.out = 100)
x3 <- seq(min(datos$x3), max(datos$x3), length.out = 100)
test.df<-expand.grid(x2,x3)
names(test.df)<-c("x2","x3")
test_pred <- predict(knn_fit4, newdata = test.df)
test.df$y <- test_pred
z <- matrix(test_pred,ncol=length(x3),nrow = length(x2))
persp(x2,x3,z,xlab="Edad vivienda",ylab="Distancia",zlab="Precio",
      main="Superficie de respuesta para un modelo con dos variables",theta=135,shade=0.3)

Con plotly se ve mejor:

library(plotly)
## Warning: package 'plotly' was built under R version 4.0.4
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
p <- plot_ly(x=x2,y=x3,z = z) 
p <- add_surface(p)
p<-layout(p,title='Precio vs dis MRT y Edad de viviend',
          xaxis=list(title="Edad"),
          yaxis=list(title="Distancia al MRT (m)"))
p

Aqui se observa donde es valida la superficie de respuesta:

ggplot(datos_completos, aes(x=x2, y=x3)) + 
  geom_point()+
  labs(title="Antiguedad vs Distancia", x= 'Edad de Vivienda', y = "Distancia")+
  theme_light()